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Abstract 



Reliable communicatioii over the discrete-input/continuous-output noncoherent multiple-input multiple-output 
(MIMO) Rayleigh block fading channel is considered when the signal-to-noise ratio (SNR) per degree of freedom 
is low. Two key problems are posed and solved to obtain the optimum discrete input. In both problems, the average 
and peak power per space-time slot of the input constellation are constrained. In the first one, the peak power to 
average power ratio (PR\PR) of the input constellation is held fixed, while in the second problem, the peak power 
is fixed independently of the average power. In the first PPAPR-constmined problem, the mutual information, 
which grows as O(SNR^), is maximized up to second order in SNR. In the second peak-constrained problem, 
where the mutual information behaves as O(SNR), the structure of constellations that are optimal up to first order, 
or equivalently, that minimize energy/bit, are expUcitly characterized. Furthermore, among constellations that are 
first-order optimal, those that maximize the mutual information up to second order, or equivalently, the wideband 
slope, are characterized. In both PR\PR-constrained and peak-constrained problems, the optimal constellations are 
obtained in closed-form as solutions to non-convex optimizations, and interestingly, they are found to be identical. 
Due to its special structure, the common solution is referred to as Space Time Orthogonal Rank one Modulation, or 
STORM. In both problems, it is seen that STORM provides a sharp characterization of the behavior of noncoherent 
MIMO capacity. 

Key Words: capacity, constellation design, energy/bit, low SNR, MIMO, noncoherent conmiunication, non- 
convex optimization, peak-to-average power ratio, peak-power, Rayleigh fading, STORM, wideband slope. 

I. Introduction 

In this paper, we consider the problem of communicating reUably over a MIMO block Rayleigh fading 
channel in the low SNR regime. We assume the noncoherent model, wherein neither the transmitter 
nor the receiver are assumed to have instantaneous channel state information (CSI), while both have 
knowledge of the channel distribution. In scenarios where the mobile receivers are moving at a high 
speed or when the number of transmit antennas is large, channel estimation at the receiver might be 
insufficient due to the small coherence times involved. The problem of the receiver acquiring CSI is 
further exacerbated in the low SNR regime, where the channel estimates can be unreliable. As a result, 
the more common assumption of perfect CSI at the receiver, namely that of coherent communications, 
may not hold true in such cases. 

A more fundamental rationale for studying the noncoherent model is as follows. Since in practice the 
channel is not known to the receiver at the start of communication, an information theoretic formulation of 
the noncoherent problem — which implicitly accounts for the resources needed for impUcit channel esti- 
mation without constraining the transmission scheme in any way — is more fundamental than the coherent 
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formulation. Systems that assume coherent transmission by arguing that the channel can be acquired at 
the receiver by the use of pilot-symbol assisted transmission to perform explicit channel estimation are 
inherently suboptimal in general while not taking into account the resources, namely energy and degrees 
of freedom, needed for pilot transmissions, as they should. 

The study of noncoherent fading channels at low SNR is motivated by their application in wideband 
(WB) and ultra-wideband (UWB) channels. In such scenarios, the signal power is spread over a large 
bandwidth, rendering the SNR per degree of freedom low. Transmissions over wideband fading channels 
experience both time and frequency selectivity. However, within a short window of time or frequency, the 
chaimel fading coefficients are known to be highly correlated. One widespread approach therefore to deal 
with frequency-selectivity is to divide the original wideband channel into several parallel narrowband 
channels such that each narrowband channel experiences flat fading or a single tap coefficient. To deal 
with time-selectivity, a common approach is to model each narrowband channel through block fading. In 
the block fading model, the channel coefficients are assumed fixed for a duration in time following which 
they assume independent and identically distributed reahzations (here adequate interleaving across time 
and frequency windows is imphcitly assumed). In this work, we model the wideband channel as a block 
faded narrowband chaimel in the low SNR regime. This simplifying channel modeling assumption helps 
captures the essence of the orignal wideband channel, and is widely adopted in the analysis of MIMO 
fading channels. 

The study of noncoherent SISO fading channels at low SNR dates back to the 1960's. Two equivalent 
notions of optimahty in the literature that are indicators of energy efficiency in the low SNR regime are 
(I) the input being first order optimal with respect to Shannon capacity or (2) the input achieving the 
minimum energy per bit or ^^^^ required for reliable communication. A classical result by Shannon 
[I] is that in the Umit of infinite bandwidth or vanishing SNR, the minimum energy/bit required for reU- 
able communications over an AWGN channel is — 1.59dB. Early work by Kennedy [2], Jacobs [3] (also 
see Gallager [4] and the references therein) studied wideband SISO Rayleigh fading channels with an 
average power constrained input and showed that in the limit of infinite bandwidth or vanishing SNR, the 
required minimum energy/bit is again — 1.59dB, the same as that of an AWGN channel. A remarkable 
observation then was that the minimum energy/bit required is the same whether or not the receiver has 
knowledge of the channel fading coeffecients. Telatar and Tse [5], and Verdu [6] show that the minimum 
energy/bit is —1.59 dB even for fairly general multipath SISO fading channels and general MIMO fading 
channels, respectively. A common approach adopted to obtain ^^^^ for fading channels is to consider 
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the achievable rate of a certain scheme (often M-ary Frequency Shift Keying or MFSK), which is trans- 
mitted at arbitrarily low duty cycles (cf. [2,4,5]). The required result is then obtained by either showing 
that the energy/bit of the scheme at vanishing SNR matches that of the AWGN channel, or by deriving an 
upper bound on capacity that is tight with respect to the achievable lower bound. However, this approach 
fixes the input a priori, and therefore no determination can be made as to the necessary conditions for 
a constellation to achieve the minimum energy/bit. The characterization of the class of signals (more 
generally, input distributions) that are both necessary and sufficient to achieve the minimum energy/bit 
had been an important and long standing open problem. 

Signals such as arbitrarily low duty-cycled FSK tend to have prohibitively large peak-to-average-power 
ratios (PAPR) and are consequently difficult to implement in practice. Such signals are therefore referred 
to as "peaky" signals in the literature. Using certain types of fourth moments of the input as measures 
of peakiness, Medard and Gallager [7], and Subramanian and Hajek [8] showed that signaling that is not 
peaky in either time or frequency dimensions cannot achieve the minimum energy/bit as SNR 0. Verdu 
[6] formahzed this notion further for fairly general noncoherent MIMO fading channels and estabhshed 
that flash signaling, where the input distribution converges to a zero mass and a non-zero mass that is 
transmitted with vanishing probability as SNR 0, is both necessary and sufficient to achieve the 
minimum energy/bit. While noncoherent communications is sufficient to transmit at the AWGN minimum 
energy/bit of — 1.59dB, the work in [6] resolves another major difficulty. It introduces and explains the 
crucial role of wideband slope (Sq) at large but finite bandwidths. The wideband slope is a measure of 
how fast the energy/bit of the optimal scheme approaches the minimum energy/bit, and is synonymous 
with the notion of second order optimality with respect to Shannon capacity. One main result of [6] 
is that for noncoherent MIMO channels with an average power constraint, the wideband slope is zero. 
This result implies that to approach the minimum energy/bit, the bandwidth for reliable noncoherent 
communications becomes prohibitively large and the associated signaling scheme prohibitively peaky, 
and therefore no realistic (i.e., bandwidth limited and peak-limited) scheme can achieve the minimum 
energy/bit. 

Hence it was important to pose problems that provide meaningful second-order performance when 
considering noncoherent fading chaimels at low SNR. One way was to impose suitable peak-constraints 
on the input. It is shown in Rao and Hassibi [9] that under certain regularity conditions on the signal, 
which include making the fourth and sixth moments finite, the noncoherent MIMO capacity grows as 
O(SNR^). Similar expressions for the mutual information up to the second order are obtained in closed 
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form in [10, 11] with different assumptions on the fading matrices and peak-power constraints. Even 
though such problems have capacity behaving as O(SNR^), and hence the minimum energy/bit not oc- 
curring at a vanishing SNR, they are important since they involve practical modulation schemes with 
reasonable PAPR. Schemes designed to satisfy such regularity conditions must be deployed in the vicin- 
ity of the SNR where the minimum energy/bit is achieved. Also relevant is the interesting case of the 
peak-constraint imposed being independent of the average power constraint, resulting in O(SNR) growth 
of capacity. In this case, it will be shown here that the wideband slope is not zero anymore (unlike the 
average power constraint only problem). Therefore, the energy/bit approaches the minimum energy/bit 
at a non-zero rate as SNR — > 0. Gursoy and Verdu [12] consider SISO Rician fast fading channels and 
impose different peak power constraints in addition to the average power constraint on the input. For cer- 
tain combinations of peak and average-power constraints, they characterize the ^^^^ and So for SISO 
Rician fast fading channels. For a combination of peak and average power constraints, they show that 
On-Off Quadrature Phase Shift Keying (OOQPSK) achieves the minimum energy per bit as well as the 
optimal wideband slope for the noncoherent SISO Rician fast fading channel. This result is obtained 
in [12] by directly evaluating a second order expansion of mutual information for OOQPSK, and this 
approach cannot be extended to more general MIMO block fading models. To the best of the authors' 
knowledge, this is the only input distribution reported in the literature that is both first and second order 
optimal, in the context of peak-constrained noncoherent communications over fading chaimels. 

Abou-Faycal et. al. [13] consider a noncoherent SISO Rayleigh fast fading channel and prove that the 
capacity achieving distribution is discrete with a finite number of points, one of them being at the origin. 
In [14], the authors consider a SISO Rician fast fading chaimel and show that the capacity achieving 
distribution is discrete even when certain types of peak-constraints are imposed. While there is no formal 
proof of the discreteness of the optimal input for MIMO Rayleigh fading channels, it is expected to be 
the case. Despite these results, discrete input optimization of information theoretic measures is rarely 
considered since the optimizations encountered are often seen as being analytically intractable. Another 
compelling reason for considering the problem of maximizing mutual information as a finite dimensional 
optimization, over a discrete and finite cardinahty input is that, the solution, if obtainable, would offer 
insights simultaneously into information theoretic as well as coding-modulation aspects. For, consider 
that even when capacity achieving probability distribution functions are found, the problem of practical 
transmission would be still unresolved as it would not be clear how the choice of a quantization of the 
optimum input would affect performance. Some recent works that deal with discrete signal constellation 
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design using information theoretic criteria but only under an average power constraint and via numerical 
optimization techniques are [15-17]. While the results in [16] provided numerically computable tight 
lower bounds on capacity of the noncoherent MIMO channel, the associated constellations may be hard 
to implement in practice due to their limited analytical structure and lack of strict peak or peak-to-average 
power ratio constraints. 

In this paper, we pose and solve two key problems of obtaining the optimum discrete input of finite 
cardinality for peak-constrained MIMO noncoherent block Rayleigh fading channels in closed form. 
Given the results of [13, 14], it is expected that there will be no loss in optimizing over discrete inputs 
as opposed to input distribution functions. In both problems, we assume average power constraints on 
the input. In addition, we also assume natural peak constraints per antenna and per time slot, which 
closely emulate constraints on power amplifiers, instead of fourth and higher order moment constraints 
on the input used in [7-9]. In the first problem, the peak power to average power ratio (PPAPR) of 
the input constellation is held fixed, while in the second, the peak power is fixed independently of the 
average power. We refer to these two problems as the PPAPR constrained and peak-constrained cases, 
respectively. We show that interestingly, in the case of the noncoherent MIMO Rayleigh fading channel 
at low SNR, such joint optimizations of information theoretic metrics over complex signal mattices and 
their respective probabiUties are indeed analytically tractable and result in elegant closed form solutions. 

In the PPAPR constrained case, it can be shown that the input satisfies certain regularity conditions 
specified in [9]. For such inputs, the mutual information is obtained up to second order in [9] and shown 
to grow as O(SNR^). In one of the key contributions here, we maximize this second order mutual 
information jointly over the matrix-valued elements of a finite input constellation and their probabilities, 
when the cardinahty of the constellations is no greater than T + 1, where T is the channel coherence 
blocklength. 

In the peak constrained case, the mutual information behaves as O(SNR). Here, we explicitly charac- 
terize the structure of constellations of any finite cardinality that are optimal up to first order, or equiva- 

lently, that minimize energy/bit or maximize capacity per unit energy. More importantly, among constel- 
lations of cardinality no greater than T + 1 that are first-order optimal, those that maximize the mutual 
information up to second order, or equivalently, the wideband slope, are characterized. 

In both PPAPR and peak constrained problems, the optimal solutions are obtained in closed-form to 
finite dimensional non-convex optimizations. Moreover, the solutions are established to be both necessary 
and sufficient to optimize their respective information theoretic metrics. Interestingly, the solutions to 

September 22, 2008 DRAFT 



IEEE TRANS. INFORM. TH. 7 

both the PPAPR-constrained and peak-constrained problems are found to be identical. Due to its special 
structure, we refer to the common solution as Space Time Orthogonal Rank one Modulation, or STORM. 

Moreover, in the PPAPR constrained case, STORM (with cardinaUty T+ 1) is shown to be near-optimal 
even among constellations of unconstrained cardinaUty, even for modest values of T and PAPR. Hence, 
there is not much to be gained by using more than T + 1 points in this case. In the peak-constrained case, 
we first obtain necessary and sufficient conditions for a constellation of any finite cardinality to achieve 
the minimum energy/bit. Among all such constellations, when the cardinality is no greater than T + 1, 
STORM is estabhshed as being both first and second order optimal. Our approach provides a far more 
detailed characterization of the first and second order behavior of noncoherent MIMO capacity than in 
existing literature. Specifically, we show that when the peak power is less than a certain threshold, it is 
possible to have a wideband slope that is non-zero, and obtain the maximum wideband slope achievable 
by a T -I- 1 point p.m.f. Moreover, the energy/bit and the wideband slope achieved by STORM reveal a 
fundamental energy-vs-bandwidth efficiency tradeoff that enable the determination of the operating (low) 
SNR and peak power most suitable for a given application. 

It also follows from our analysis and optimization that while the conventional MIMO On-Off Keying 
(OOK) also achieves the minimum energy per bit, STORM has a wideband slope that is T times greater 
which translates into an increase in bandwidth efficiency (or a decrease in the PAPR) by a factor of T in 
the wideband regime. Given typical values of the coherence blocklength T, these gains are potentially 
huge. Our results and conclusions also temper the conclusions of [6] obtained under only the average 
power constraint regarding noncoherent communications over fading channels. 

Among the several new insights that STORM provides on communications in the low SNR regime one 
that runs contrary to conventional wisdom is that, under the practical constraints considered in this work, 
it helps to use all available transmit antennas, not just one, to transmit linearly dependent signals across 
them in the low SNR regime. 

Note that in this work, the input distribution is not a priori assumed or restricted as it is in most prior 
work. STORM is obtained through novel techniques involving non-convex optimization of information 
theoretic measures. Consequently, our approach provides necessary and sufficient conditions for a con- 
stellation to be optimal for the noncoherent MIMO Rayleigh fading chaimel, resolving a long-standing 
open problem. Low duty cycled M-ary FSK (MFSK) [2, 4, 5] which is often proposed to achieve first 
order optimality in a SISO channel, is seen to be closely related to a special case of STORM. However, 
the zero symbol in STORM is information bearing which is not the case in low duty cycled MFSK. This 
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can make STORM have higher achievable rates especially in the PPAPR-constrained case. Moreover, in 
this work, we specify a class of STORM constellations. One subtle insight afforded through different 
STORM constellations is that optimal signal constellations need not be peaky in frequency dimension (as 
in low duty-cycled MFSK), in addition to be being peaky in time dimension. In the process, we discover 
a new optimal SISO constellation which may be called "Permuted MFSK" due to its relation to MFSK 
but would have better spectral properties in general. 

To close this section, some notational conventions used throughout the paper are described. Matrices 
are denoted by the boldfaced capital letters, and vectors by bold faced small letters. The symbol (g) 
denotes the Kronecker product. The matrices X^^, X and X* denote the transpose, complex-conjugate, 
and conjugate transpose of X, respectively. Moreover, tr(X) and |X| denote the trace and determinant 
of the matrix X. The notation [X]^^ refers to the (i, j)*'* element of the matrix X. The notation X^"*) 
refers to the m*^ row of the matrix X. For an integer iV, Ijv is an AT x N identity matrix and Iat is the 
N length column vector of ones. The block diagonal matrix with matrices Ai , . . . , Ajv along the block 
diagonal and zeros elsewhere is denoted as blockdiag(Ai, A2, . . . , Aat). E[.] denotes the expectation 
operator. A function f{p) is said to behave as o{p) when limp_^o = 0- The symbol is used 
to denote the complement of the set X. The symbol ^ is used to denote generalized inequality ,i.e., if 
A ^ B then B — A is positive semidefinite (psd). The first and second derivatives of a function f{x) at 
X = c are denoted by /(c) and /(c), respectively. The function log(.) always refers to natural logarithm, 
unless otherwise specified. Complex, circularly symmetric, Gaussian random vectors with mean m and 
covariance matrix Q are said to be CM{m, Q) distributed. 



Consider a MEMO chaimel with Nt transmit and Nr receive anteimas. The random channel matrix 
H G <£^t^^'^ is assumed to be constant for a duration of T symbols after which it changes to an inde- 
pendent value. It has independent, identically distributed (i.i.d.) CM{0, 1) entries. The knowledge of the 
distribution of H is known to the transmitter and receiver. The realizations of H however, are unknown at 
both ends. With the transmitted symbol denoted as X eC^^'^*, the output of the channel can be written 
as 



The entries of the additive noise matrix N are assumed to be i.i.d. CM{0, 1) distributed random variables. 
The symbol X is drawn from a finite constellation or alphabet C with matrix-valued elements. 



II. System Model 



Y = 



XH-FN . 



(1) 
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Two key cases based on the types of constraints imposed are considered in this work. 

(i) PPAPR-constrained case : It is assumed that the average SNR at each receive anteima is constrained 
to be P so that 

iE[tr(XX*)] <P. (2) 

Moreover, a peak-power constraint is imposed per space-time slot, namely, 

||X||oo = max |[X].. |< ,VXgC. (3) 

This is most natural and practically meaningful peak-power constraint as it restricts the peak-power per 
anteima and per time slot (to be at most K). It accurately models constraints on individual transmit RF 
power amplifiers in practice. The PPAPR constraint is that the ratio ^ is taken to be a fixed constant. 
This condition ensures that as the average SNR P ^ 0, the maximum peak-power also goes to zero. 

(ii) Peak-constrained case : Here, the average power constraint (2) and the peak-power constraint (3) 
are assumed to hold. In this case however, K is assumed to be a fixed constant independent of P. In other 
words, in contrast to the PPAPR-constrained case, the peak power remains constrained by K (and does 
not change) as the average SNR P — >^ 0. 

For convenience, we will denote the average energy per block of T symbols as = PT. 

The noncoherent MIMO Rayleigh fading channel thus described is completely specified by the input 
constraints and the transition probability density function (p.d.f.) of Y conditioned on X being transmit- 
ted and is easily seen to be 



exp I -tr f Y* (It + XX*)"^ y) | 
p(Y|X) = - ^ ^ 



Finally, there will also be occasion to use the notion of the peak-to-average power ratio (PAPR) of a 
constellation C which is defined as 

max max [^"^^"""^ , . (4) 
m,n xeC e{|[X]_|^} 

III. Maximizing the mutual information at low SNR under the PPAPR constraint 

Consider the above-defined finite input and continuous output noncoherent MIMO Rayleigh fading 
chaimel over which the input constellation {Xj}^^ is used with corresponding transmission probabiUties 
{Pi}l^i. The mutual information between the transmitted and received signals, normaUzed by the block 
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length T (in units of nats/dimension), is thus given as 

^(X'Y) = ^E^^ /^MY|X,)log (^^^) dY. (5) 

A closed form expression for /(X; Y) is unfortunately not known for general SNR. At asymptotically 
low SNR however, and when the input signal satisfies certain regularity conditions to avoid inputs being 
prohibitively peaky, the authors in [9] show that the mutual information is zero up to first order for the 
continuous input and continuous output counterpart of the above channel. Moreover, the mutual infor- 
mation up to the second order in P is also obtained in closed form through a Taylor series expansion and 
without any assumption on the signal structure beyond the regularity conditions. Note that the expres- 
sion for mutual information up to second order was also derived earher in [10] and [11], but with more 
stringent conditions on the input distribution. 

For the sake of completeness, the key theorem in [9] for the continuous input and continuous output 
channel, slightly modified to account for the different power normalizations in this paper, is stated next. 

Theorem 1: [9, Theorem 1] Let p{Y) denote the p.d.f. of Y. 

1. First order result : If (i) exists at P = 0, and (ii) limp_^o ^^'^'^^^"^"^ = 0, the mutual 
information between the transmitted and received signals X and Y is zero to first order in P, i.e. , 
/(X;Y) = o(P). 

2. Second order result : If, in addition, (i) exists at P = 0, (ii) E tr |(XX*)^| < oo and 

Eftr-f (XX*)''T1 

(iii) limp^o pi — = 0, then the mutual information between X and Y up to second order in P 

is given by 



I(X;Y) = ^tr{E[(XX*)V(E[XX*])2}+o(p2). (6) 

The applicability of the above result to the discrete input channel with the PPAPR constraint is next 
discussed. Firstly, following the proof of the above theorem in [9], it can be seen to hold for the discrete 
input (and continuous output) case and yield the same expression as in (6) for mutual information with 
the expectations in (6) now over the discrete instead of continuous input as in [9]. The existence of the 
first and second derivatives of p(Y) at P = are easily verified for the problem at hand. With the PRAPR 
constraint in effect, the peakiness conditions, namely conditions l.ii and 2.ii and 2.iii of Theorem 1, are 
also easily verified to hold as well. Hence, it can be concluded that for a discrete input satisfying the 
PPAPR constraint (i) the mutual information is zero up to first order in P and (ii) denoting the coefficient 
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of in the mutual information /(X; Y) of (5) as Iio^, 

^low = lim^i^= lim^tr{E[(XX*)V(E[XX^^^^ (7) 

Evidently, the dominant second order term in the mutual information at low SNR is /lowP^- The prob- 
lem of interest is hence to maximize /low over {Xj}^]^ and {Pj}^^ under an average power constraint 
Pitr(XjX*) < and a peak power constraint ||Xi||oo = max^,„ |[Xj]^„| < y/K Vi 
Before unveiUng the solution to the above problem, we note that in [9] the mutual information up to 
second order is maximized over continuous input distributions under two different peak power constraints. 
The solutions however rely on the assumption that the input signal has the form 

S = *V, (8) 

where * is an isotropically distributed unitary random matrix and V is a diagonal (random) matrix with 
non-negative entries. While this imposition entails no loss of optimahty for the case when only the 
average power is constrained (which is a seminal result of [18]), it does result in a loss of optimahty, 
and a significant one at that, when the peak-power constraint of [9] is enforced which is that the diagonal 
entries |t^p < K. Due to the suboptimal restriction in (8), the maximizations in [9] lead to the misleading 
conclusion that it is optimal to use a single transmit antenna in the low SNR regime. In [11] also, the 
authors perform the same maximization over continuous input distributions but under a more relaxed 
peak-constraint tr(XX*) < e and conclude that a single antenna should be used. Different from [9] 
and [11], the optimization problem considered here does not sub-optimally restrict the signals to be as in 
(8) while considering averaged power constrained discrete inputs and the practically relevant peak-power 
constraint per space-time slot. These assumptions result in a significantly different and more challenging 
problem than those considered in [9] or [1 1]. Indeed, in contrast to [9] or [1 1], our results indicate that in 
the PPAPR-constrained problem, at sufficiently low SNR, it actually helps to use all transmit anteimas. 

For the PPAPR-constrained problem, the set of all feasible constellations with cardinality L is denoted 
as Sl and can be described as 

Sl=1 (Xi , POii : P, > , X, gC^x^* , 5^ P, = 1, ^ P, tr(XiX*) < E, ||X,|U < Vk, vA 

L i=i i=i ) 

It is assumed, without loss of generality, that KNfT > E, because otherwise, the average power con- 
straint cannot be active and one can therefore solve the problem by changing the average power constraint 
to E' = KNtT. Let the PPAPR be denoted as C = a constant in the PPAPR-consti-ained case as P 
varies. 
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Let ^ be the maximum mutual information up to second order achievable by any constellation in 
the set «Sl, so that 

I*ow,L = max /low ■ (9) 

Note that when Pi = 0, the symbol Xj is not used and therefore the set of feasible constellations in S l' 
is included in the set S l for any L' < L. Hence, Ij*^ ^ is the maximum mutual information up to second 
order achievable by any constellation of cardinahty no greater than L. The maximum mutual information 
up to second order when there is no upper limit on the cardinality of the discrete input constellation 
is defined as I*^^ = limi_»oo -^low l- ^^^^ ^ shown in what is to follow that /j*^ ^ 
associated constellation of size T + 1) is near-optimal in that it can be very close to I*^^ (and the as yet 
unknown constellation which achieves the latter). 

The following theorem is one of the main results in this paper. 

Theorem 2: (PPAPR-constrained case) Let the coherence time T > 2. When L < T+1, the maximum 
second order mutual information with an L-point input constellation is given as 

Aow,L - -^[^- (L-i)ivJ ■ ^^"^ 

An L point constellation (or p.m.f.) achieves I*^^ ^ with L < T+ 1 if and only if (iff) it is of the following 
form 

(X.,P.) = (^v.w*,^^-^^) ,l<i<L-l (11) 

(Xl,Pl) = (o^xiv, , 1 - , (12) 

where for each i, Vj gC^^^ is the i*^* column of a unitary matrix V, Wj G(D^'^^ and 

I [viW*]^„ I = 1, Vi,m,n . (13) 

Furthermore, I*^^, the maximum second order mutual information with an unconstrained cardinality, is 
bounded above and below as 

Ito..,L\L=T+l = T " - < T^^*^ • ^^"^^ 

Proof: The proof is given in Section HI-B. ■ 
The optimal signal constellation for L = T + 1 given in Theorem 2 can be viewed as a space-time 
code (employing unequal transmission probabilities) that achieves the maximum mutual information up 
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to second order at low SNR. Based on its structure, it is referred to as Space Time Orthogonal Rank 
one Modulation (STORM) because each non-zero matrix is of unit rank and is orthogonal to the other 
constellation matrices by construction. Two examples of matrices that can be used for the unitary matrix 
V are the Discrete Fourier Transform (DFT) matrix and the Hadamard matrix (when it exists). In one 
embodiment of STORM, Wj can be chosen to be 1 Nt Vz. In this case, each of the L — 1 = T non-zero 
constellation points is formed from a column of V and this column is repeated over the Nt antennas. The 
L*^ point is of course the all-zero matrix. 

It can be seen that for STORM, the R\PR as defined in (4), is ^ = (Nt > 1. Clearly, the ratio 
between the upper and lower bounds on Ii^^ in (14) is nearly equal to unity when (^NfT >> 1. This is 
evidently true even for moderate and practical values of PAPR and T. As an example, for ( = 2, Nt = 2 
and r = 4, the ratio is 0.94. Hence, even for moderate values of PAPR and T, the T -|- 1 point STORM 
almost achieves /j*^ (the limit with unconstrained cardinality) and there is not much to be gained by 
using more than T +1 points. 



Since STORM achieves a significant fraction of I*^^ even for moderate values of ij and T, the following 
insights from its structure and mutual information up to second order it achieves at low SNR are of 
interest. For brevity, the mutual information up to second order at low SNR is simply referred to as 
mutual information in the rest of this section. 

1. It can be seen that the mutual information of STORM increases hnearly with the maximum peak 
power K. That it is an increasing function is to be expected since peaky signaling is known to achieve the 
noncoherent capacity in the low SNR regime when there is only an average power constraint. Moreover, 
the mutual information also increases linearly as a product Nt.Nr of the numbers of transmit and receive 
antennas. The use of a single antenna is evidently suboptimal by a factor of Nt. 

2. A reason that is often cited in the literature for explaining the efficacy of using a single antenna at 
low SNR is that the number of channel parameters that are to be implicitly estimated is the least in this 
case. The use of a single antenna however is not necessary to ensure this and can even be detrimental 
to performance as explained above. Consider STORM, where the received signal when the i*'* non-zero 
signal is transmitted is 



where h-^ = w|H and so h is CJ\f{0, NtlNr) distributed. Therefore, the effective channel (15) does in 



A. Remarks 



Y = Vk ViW*H N = Vk Vjh^ 



+ N, 



(15) 
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fact involve only Nr (and not NtNr) unknown channel coefficients even though all transmit antennas are 
used. The optimality of the unit rank structure of STORM could thus be indeed attributed to the difficulty 
of (implicit) estimation of NtNr coefficients at low SNR because it avoids this task by focusing the power 
on just Nr effective unknown path gains, while at the same time making use of all the transmit antennas. 

3. Consider the case when Wj = IjVj VHn (11), which is sufficient for T + 1 point STORM to be 
optimal. Then the symbols sent by all transmit antennas at any given time are identical and the fading 
gains effectively add up at each receive antenna. So, why not just use a single transmit antenna? All 
transmit antennas must be used because otherwise the effective received power is smaller due to the 
peak-power constraint which hmits the symbol power per antenna and per time slot. 

4. A canonical embodiment of STORM is one that results from setting Wj = l^Vt , Vi and V = [vi • • • vt] 
to be a T-dimensional DFT matrix in (11). A convenient feature in this DFT version of STORM is that 
the entries of the signal matrices can be transmitted using PSK symbols with an additional zero point. 
Alternatively, a T-dimensional Hadamard matrix can be used for V (when it exists). The advantage of 
using a Hadamard matrix is that it is enough to transmit real symbols for each entry, specifically, BPSK 
and an additional zero point. Hadamard matrices of dimension T exist when T = 2" for any natural 
number n and also for many multiples of 4. In Appendix-B, we show how block decoding of STORM 
may be simplified using either the Fast Fourier Transform (FFT) or the Fast Hadamard Transform (FHT), 
when L — 1 is a power of 2. 

5. Consider the special case when there is only a peak-constraint on the input (i.e., KNfT = E). Here, 
it can be seen that STORM has no zero point (so L = T) and is given by 



Hence, all points are equiprobable and the PAPR is unity, thus faciUtating practical implementation. 
Moreover, this constellation is near-optimal when there is only a peak constraint and when T >> 1 as 

seen from the bounds on /[^^ in (14) of Theorem 2. 

6. The canonical version of STORM can be seen as a form of generalized (T + l)-ary ON-OFF signal- 
ing with repetition coding across the transmit antennas and with unequal probabilities of ON and OFF 
signaling, with the ON signaling actually being the classical T-ary, equiprobable Frequency Shift Keying 
(T-FSK). The larger the allowed PPAPR, the higher the probability of the OFF signal. In fact, STORM 
takes advantage of all the peak power allowed for each space-time slot when transmitting non-zero sym- 
bols while meeting the average power constraint by the inclusion of the zero symbol with as high a 
probability as the PPAPR constraint would allow. 
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7. Consider the special case of a SISO system when there is only a per time-slot peak power constraint K. 
Here, Theorem 2 establishes the second order optimality of equiprobable T-FSK at low SNR among all 
T-ary constellations, and the near-optimaUty under unconstrained cardinality when T » 1. In general 
for SISO systems however, depending on the peak and average power constraints, an additional zero 
signal is needed of probability different from that of the T (equi-probable) T-FSK signals. 

8. The mutual information of STORM may be expressed as ^ {KNt - For fixed K, T, and E, it 
increases linearly with Nt. This may be attributed to the fact that increasing Nf with fixed K, T and E 
increases the overall peak-power tr(XX*) = C^tE, while simultaneously decreasing the probability of 
transmitting a non-zero signal thereby making the signals more peaky in the time domain. On the 
other hand, when T is increased for a fixed Nt and E, the overall peak-power tr(XX*) = (NfE and 
the probabiUty of transmitting the zero signal ^ remain fixed but the mutual information increases with 
T. To get some insight on why this is so, consider the canonical version of STORM. An increase in T 
imphes that the T-FSK transmissions (repeated over each antenna) become more peaky in the frequency 
domain ^ 

9. STORM constellations other than the canonical ones can also be constructed. For example, one can 
use the inverses of the DFT and Hadamard matrices for a choice of V. More generally, if V is unitary 
with unit-magnitude elements so is V = PVQ where P and Q are T x T permutation matrices. Q 
only permutes the columns of V thereby renumbering the signals leaving the STORM constellation un- 
changed. However, row permutations induced by P would result in constellations that are no longer peaky 
in the frequency domain as compared to the canonical DFT version of STORM. It is unclear as to how the 
complete class of STORM constellations can be constructed. In this regard, note that the Wj vectors can 
be arbitrary as long as its elements have unit magnitudes. So "repetition" across transmit antennas can 
involve arbitrary phase rotations or multiplication by possibly distinct unit-magnitude complex numbers. 

10. The cutoff rate for the discrete input (of cardinality L) and continuous output channel is given by 



The cutoff rate was initially advocated as a design criterion for modulation schemes in [19] and [20]. It is 
a lower bound on the random coding exponent, and also provides an exponentially accurate description of 
the attainable error probability when communicating at the critical rate [19]. Let the argument of m£Lx(.) 
in (17) be denoted as the cutoff rate expression. For the noncoherent MIMO channel at low SNR, the 
^This was pointed out by a reviewer. 
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cutoff rate expression is easily shown to be (c.f. [21]) 

CRiow = ^5^FiPitr{(X,X*-X,X*)'}+o(p2) (18) 

An interesting property of CR\ow [16] is that when the input constellation satisfies the regularity con- 
ditions, limp^o '^/^'"'^ = i. In the limit of low SNR therefore, CR behaves identically to the mutual 

low ^ 

information. Therefore, the T + 1 point STORM also maximizes the cutoff rate expression up to second 
order at low SNR. 

1 1. An often used noncoherent constellation design criterion (cf. [22, 23]) is to maximize the worst-case 
chordal distance which is given by min^^j tr |l — X*XjX^Xj|. For STORM, the worst-case chordal 
distance is the maximum possible as for every i ^ j, X*Xj = OjVt x Nt • Moreover, the difference between 
any two distinct matrices in STORM has unit rank, and hence the scheme would have a diversity order 
of Nr at high SNR if employed as a coherent space-time code [24] whereas constellation design at high 
SNR for the coherent MIMO channel is typically geared towards achieving maximum diversity (NtNr). 
Theorem 2 shows that optimal noncoherent constellations at low SNR have quite the opposite properties 
from good coherent constellations at high SNR. 

12. Subsequent to the conference version of this paper [25] (see also [26]), Sethuraman et. al. [27] con- 
sider a MIMO Rayleigh fading channel with the noncoherent assumption and with the fading process 
modeled as stationary and ergodic, as well as correlated over time. The authors characterize input distri- 
butions which are optimal for the stationary and ergodic MIMO channel, under average-power constraints 
and peak-constraints which are per space-time slot similar to the PPAPR-constrained case here. Interest- 
ingly, one distribution identified in [27] which achieves the capacity up to second order can be seen to be 
closely related to the canonical version of STORM here. While this distribution is obtained for a different 
fading process, the channel coherence time T here can be thought of as playing the same role as channel 
memory in [27]. 

B. Proof of Theorem 2 

In this subsection, the proof of Theorem 2 is given. The following definitions and lemmas are needed 
first from [28]. 

Definition 1: A convex maximization problem is an optimization problem in the following form : 

max /(x), (19) 
X e A 

where /(x) is a convex function and X c 3f?" is a convex set. 
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Definition 2: A point x on the boundary of a convex set X is called an extreme point if there are no 
distinct points xi, X2 G X such that x = Axi + (1 — A)x2 , < A < 1. 

Lemma 1: A closed, bounded convex set in 3fi" is the convex hull of its extreme points. 

Lemma 2: The global maximum of a convex function / over a compact convex set X is attained at 
an extreme point of X. A point in a compact convex set X is a global maximizer of a strictly convex 
function / iff it is an extreme point of X. 

Definition 3: A polyhedron is defined to be the set of points P = {x € : Ax < b, where 
A G sj^rnxn ^ sjjm ^ boundcd polyhedron is called a polytope. The extreme points of a polytope 

are referred to as vertices. 

The next lemma gives the necessary and sufficient conditions for a point to be a vertex of a general 
polytope. 

Lemma 3: With the same notation as in Definition 3, let a^, 1 < i < m denote the rows of the 
matrix A. Further, for x G "P, let / = G {1, . . . , m} : a^x = 6i} describe the inequalities which are 
binding (active) at x, and let Aj be the matrix with rows G /. Then x G "P is a vertex of "P iff 

rank (A/) = n. 

The following lemma more sharply specifies the vertices of a special polytope which will be useful in 
the proof of Theorem 2. 
Lemma 4: Consider the polytope defined by 



X> = f^d:^Pidi<E,0<di<Q,i = l,...,L^, 



(20) 



which is the intersection of the half-plane Y^ - Pi di < E and the hyper-cube < di < Q. Each vertex of 
X> consists of L — 1 entries that are either Q or 0, and exactly one entry c such that < c < Q. 

Proof: The polytope T> can be expressed in the standard form Ad < b given in Definition 3, by 
setting 



A = 



q 

II 

-Il 



and 



b = 



E 
Ol 



(21) 



(2L+l)xL 

where q = [Pi P2 ■ ■ ■ -Pl]^- Let x be a vertex of the polytope described by Ad < b. Then, the rows 
of A which satisfy af x = 6j should form a matrix with rank L by Lemma 3. If x is a vertex for which 
q-^x = E then there are at least L — 1 more linearly independent rows of A that correspond to active 
constraints. Suppose A; of them are of the form x-, =Qforj G JC {1, 2, . . . , L}, then atleast L— 1 — A; 
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active constraints (out of the remaining L — k constraints) must be of the form xj = for j G . 
Hence, at most one entry of x can lie anywhere between and Q (call it c). If x is such that q^x < E, 
then of course it is a vertex by Lemma 3 iff xj = Q for all j in the subset J C {1, 2, . . . , L} for which 
YljeJ ^ ^ ^3 ~ ^ ^ j ^ '^^ (there are as many such vertices as there are subsets J for 
which X^jg J Pj < -E). In this case, all the entires of the vertex are either Q or (set c = or Q). ■ 
Proof: (of Theorem 2): The problem that needs to be solved here is essentially 

subject to Pitr(XiX*) < E 

i 

i 

where /low is given in (7). Maximizing /low is equivalent to maximizing 

Pitr (X,X*X,X*) - tr j P^X^X* ^ P,X,X* j (23) 
= ^ Pi (1 - Pi) tr (X,X*X,X*) - ^ P,P,- tr (X*XiX*X,) (24) 

L 

< ^ Pi (1 - Pi) tr (X,X*XiX*) . (25) 

i=l 

Since terms of the form tr ^X^X^X^Xj^ are non-negative, (25) follows by replacing all negative terms 
in (24) by zero. Let Xj^ denote the k^^ column of the matrix Xj. The equaUty in (25) occurs iff x|j.Xj; = 
VA;, l,j ^ i. The strategy is to maximize the bound in (25) and show later that the signal constellation 
that maximizes it achieves equaUty in the inequality in (25) when L <T + 1, thereby maximizing /lo^ 
in these cases. So, let us consider the optimization problem 

max y Pi(l - Pi)tr (X^X^X^X*) (26) 

m}f-=i'{x.}f.i i 

subject to ^ Pitr(XiX*) < E 

i 

||Xj||oo < V^, 

^Pj = l,Pi >OVi 

i 

In Appendix-A, a simple argument is given that shows that the maximization of (26) is a non-convex 
optimization problem. A two-stage approach is thus adopted for solving the optimization in (26). In 
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the first stage, the objective function is maximized over while holding {Pijf^i fixed. In the 

second stage, the resulting objective function is maximized over {Pi}i=i. Furthermore, it is shown that 
the optimization in the first stage can be split into two successive convex maximization problems and 
the optimization in the second stage is a convex minimization problem. It is this nice structure that is 
exploited to obtain the signal matrices {Xj}^^ and the probabiUties {Pi}f^i that jointly optimize the 
upper bound on mutual information (up to second order at low SNR) in (26). 

Consider first the optimization in (26) over {Xj}^^ for fixed {Pi}f^^. This problem is decomposed 
into two steps. In the first step, tr(XjX*) = di is fixed for some {di}^^^ and the best set of {Xj}^^ is 
found. Note that di is equal to the energy of the i^'^ signal and because of the peakpower constraint, it 
is sufficient to restrict di G [0, KNtT]. In the second step, the resulting objective function is optimized 
over di, i = 1,...,L. Geometrically, we first find the matrices {Xj}^]^ that maximize the objective 
function over the contour tr(XjX*) = di Vz and then optimize the resulting objective over {di}f^i, 
thereby obtaining the best contour for an arbitrary but fixed {Pi}f^i. As it is shown below, both these 
problems can be solved as convex maximization problems. 

With tr(XjX*) = di £ [0, KNtT], it is clear that the objective function in (26) is maximized when 
for each i, Xj is chosen according to 

max tr(XiX*XiX*) . (27) 

tr(XjXl')=di 
l|Xil|oo<VF, Vi 

Let the eigenvalues of the positive semidefinite matrix XjX* be {Xm}m=i (^^^ dependence on i is im- 
plicit). Then, the solution of (27) is upper bounded by the solution of 



max 

Am,>0, Vm m 

with equality iff the additional constraints ||Xj||oo < ■s/K hold for each i for the matrix that achieves 
the maximum in (28). Since the objective function in (28) is strictly convex while the constraint set is a 
polytope, the problem in (28) is a strictly convex maximization problem. Hence by Lemma 2, a solution 
is globally optimal iff it is a vertex of the constraint set. In this case, the constraint polytope has T + 1 
vertices which can be found by inspection to be 

[0 ... 0]'^ , [di ... 0]'^ , [0 di ... 0]'^ , ... , [0 ... dif (29) 

since none of them can be expressed as a convex combination of any other points in the set, and any point 
in the set can be expressed as a convex combination of the points in (29). Now, since all the vertices 
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except the all-zero vector give the same value df for the objective function, df is the sought maximum. 
This in turn implies that all the matrices {X.i}fLi have to be of unit rank for the objective functions to 
achieve their maximum value of df for each i (we adopt the convention that the all zero matrix is of unit 
rank). Let the number of matrices in {Xj}^^ which are not OxxNt be L'. If more than one of the di's 
are zero, they would all correspond to the same zero signal point OxxNt their respective probabiUties 
would simply add up, resulting in one effective zero symbol matrix. Therefore, L = L' + 1 or L = L' 
depending on whether or not there is a zero symbol. 

When L' < T, consider the following constellation {Xj}^]^, 



= a/j^v,w* , di>0 (30) 
Xi = , di = 0, (31) 

where the vectors v, and w, are constrained as in (13). Note that the set of matrices in (30, 31) are of 
unit rank and satisfy tr (XjX*) = di , 1 < i < T. Hence they solve the problem in (28). Now, since 
di < KNfT, using (13), it follows that ||Xj||oo < \/K Vi and hence they also solve the problem in (27). 
Moreover, since L' < T, any pair of different constellation matrices have orthogonal columns (since Vj's 
are orthogonal), which ensures that (25) holds with equahty. It will eventually be shown that the optimal 
values of the non-zero {di} are all equal with di = KNtT Vi This in turn implies that the structure in 
(30) and (31) is also necessary. 

When L' > T, the set of Vj in (30) can no longer be selected to be orthogonal to each other. Neverthe- 
less, a set of rank one matrices with the structure given in (30) but with a non-orthogonal set of Vj (normal- 
ized in the same way), still solves both (27) and (28). Therefore, the expression |^ Yli=i ^ii.^ ~ ^i) df 
serves as an upper bound on the maximum mutual information up to second order achievable by any 
constellation of cardinality of L = L' + 1, which is I^^^ ^. 

In summary the best constellation {Xj}^^ can be specified for any set of non-negative {di}f^-^. It 
remains to find the best {di}^^^ according to 

max y^Pi{l-Pi)dl. (32) 
{*}f=i i 

subject to '^Pidi<E (33) 

i 

0<di< KNtT Vi (34) 

For a fixed {Pi}f^i, this is also a strictly convex maximization problem over a polytope. Hence, a vertex 
of the polytope is both necessary and sufficient to achieve the global optimum. The polytope constraint 
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set is exactly of the form considered in Lemma 4 which states that each vertex would consist of L — 1 
entries that are either KNtT or 0, and at most one entry c such that < c < KNfT. For vertices for 
which J2- Pi di < E, it is necessarily the case that all entries are either or KNtT. 

Consider the second stage of the optimization which is over {Pi}^^-^. Following the result of the 
optimization in the first stage, the structure of the optimal d and the corresponding probabiUties are of 
the form 

T 



KNtT . . . KNtT c 
V ' 

M times 



, 0<c<KNtT. (35) 



P = [Pi . . . Pm Pm+1 Pm+2] (36) 

where M denotes the number of entries in d that are equal to KNtT. Note that when Pm+i = 0, the 
constellation point corresponding to the entry c such that < c < KNtT, is not transmitted. We know 
that whenever (33) is strict, there cannot be an extreme point d of the constraint set formed by (33) 
and (34), which has an entry c such that < c < KNtT. Therefore, in the case of a strict half-plane 
constraint, we will take Pm+i = for the optimal constellation without any loss of generality, which 
simplifies the subsequent convex minimization problem. The cardinality of the constellation L depends 
on the number of non-zero probabilities in the optimal constellation and is related to M by L < M + 2 
in general. 

With the structure of the optimal d, the optimal set of probabihties are determined next in terms of M 
and c. Following that, the values of M and c are obtained that maximize the resulting objective function. 
For convenience, consider minimizing the negative of the objective function in (32) after the optimal d is 
substituted as follows: 

M 



min -K'^NfT^ V - Pi) - c'PM+iil - Pm+i)- (37) 

r p \M+2 ' 
M 

subject to KNtT ^Pi + cPm+i < E (38) 

i=l 

M+2 

^ Pi = 1, Pi > 0, 1 < i < M + 2 (39) 

The optimization over P in (37) is the more commonly studied convex minimization problem [29]. The 
Lagrangian can be written as 

M /M+2 \ 

L(P,/3,A,{Mj,^f) = -i^2iV2r2^P,(l-Pi)-c2pM+i(l-PM+i) + /9 

i=l V i=l / 
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{M ^ M+2 

KNtT + ^^A^+i l^i^i- (40) 

It can be verified that Slater's conditions [29] are satisfied and hence, strong duaUty holds. Therefore, the 
Karush-Kuhn-Tucker (KKT) conditions are both necessary and sufficient for the optimal solution P and 
are given as 

M 

A>0 , Hi>Q^i , KNtT^Pi + cPu+i < E 

i=l 

M+2 ( M \ 

^Pi = l , \\KNtT^Pi + cPM+i-E\ = 0, /iiPi = 0, 0<z<M + 2 
1=1 I 1=1 ) 

-K^N^T^{l-2Pi) + XKNtT + p-Hi = 0, l<i<M 

-c^{1-2Pm+i) + Xc + P- HM+i = 0, 

/? - flM+2 = . 

By ehminating the slack variable fi, we get 

<i<M (41) 

(42) 
(43) 
(44) 
(45) 

<i<M (46) 

(47) 

<i<M + 2 (48) 

(49) 



K'^N^T^{2Pi - 1) + XKNtT + p 


> 


0: 


c\2Pm+i-1) + Xc + I3 


> 





/? 


> 











A (^KNtT Y,Pi + - 







PPm+2 







{K'^NfT^{2Pi - 1) + XKNtT + 0) Pi 




0: 


{c^{2Pm+i-1) + Xc + 0)Pm+i 







Pi 


> 


: 


M+2 










1 


M 






KNtTY,P, + cPM+i 


< 


E 



(50) 



i=l 

From (46), it can be seen that Pj can take one of two values, namely. 



P, = or P, = i-^^y+f. (51) 

2 2K^N^T^ 

Points with zero probability are redundant and since the optimal number M is determined only later, it 
may be assumed that the M probabiUties Pj for 1 < z < M are the same and given in (51) and denote 
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these probabilities simply as "Pi". Similarly from (47), Pm+i can take one of two values, namely, 

Pm+i = or Pm+1 = ^-^U^- (52) 

Four cases must be considered to find the solutions to the KKT conditions. Recall that KNfT > E. 
Case 1 : KNtT Pi + cPm+i < E , Pm+2 = 0. 

The strict inequality in (50) implies that A = from (44). Since the power constraint is a strict 
inequaUty, we may take Pm+i = from the discussion that follows (36). Therefore, Pi = ^ is necessary 
to satisfy (49). From (51), we obtain = ^^K'^N^T'^. The condition (3 > implies that M > 2. The 
strict inequality in (50) together with Pm+i = implies that this case holds when KNtT < E, which is 
never true. Therefore, this case does not occur. 

Case 2 : KNtT Y^Zi Pi + c^M+i < E , Pm+2 > 0. 

The strict inequaUty in (50) impUes that A = from (44). Since the power constraint is a strict 
inequality, we may take Pm+i = from the discussion that follows (36). Since Pm+2 > 0, we have 
/? = from (45). Therefore, Pj = ^ from (51) and Pm+2 = 5- From (50), this case applies when 
KNtT < 2E and M = 1. 

Case 3 : KNtT ZZi + c^M+i = E , Pm+2 > 0. 

Since Pm+2 > 0, we must have /9 = by (45). There are three sub-cases here, viz., (i) Pj > 
, Pm+1 > (ii) Pj > , Pm+i = and (iii) Pj = , Pm+i > 0. We first consider sub-case (i). 
(i) Using the values Pj = ^ — 2KNtT ^^'^ Pm+i = 5 ~ ^ from (51) and (52) in the power constraint 
equaUty, we can solve for A as A = . Substituting this value of A in (51) and (52), we obtain 

P = K^'^^-<- + ^^ (53) 
' 2(M + V)KNfT ^ ^ 

_ M{c - KNtT) + 2E 

- 2c(M + 1) ■ ^^^^ 

Using the above probabiUties in the objective function / given in (37), we observe that 

which means that / is a concave function over c. Since Pm+i > 0, we get from (52) that A < c. 
Therefore, the range of c in this case is given by A < c < KNtT. Since the optimization of / over c is a 
concave minimization problem, the minimum is either at c = A or c = KNtT by Lemma 2. 
Choosing c = A gives Pm+i = from (52), A = KNtT — ^ and therefore 

Pj = . (56) 

' MKNtT ^ ' 
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Consequently, from (49) we get that 



Pm« = 1-^. (57) 

If c were instead chosen to be KNfT, then from (53) and (54), Pi = (^M_^Y)KNtT ' ^m+i = [M+i)KNtT 
and therefore Pm+2 = 1 — xNtT - Since we are yet to optimize over M, the above solution clearly is 
identical to that obtained in (56) and (57). So we may choose c = A itself as the solution. 

For c = A, since A > 0, this case requires KNfT > Moreover, the power constraint equaUty 
requires that KNtT > E. Hence, this sub-case solves the convex optimization problem for the cases 
KNtT >E, M > 2 and KNfT > E , M = 1. 

Even for sub-cases (ii) and (iii), it can be easily verified that we get essentially the same solutions as 
the previous sub-case. 

Case 4 : KNtT E,=i P^ + cPm+i = E , Pm+2 = 0. 

The cases KNtT > E , M > 2 and KNtT > E , M = 1 are solved completely through Cases 2 and 
3. This is true because by strong duality, the constellations obtained in Cases 2 and 3 are both necessary 
and sufficient for optimality. Moreover, since KNtT < E does not occur, we do not solve for Case 4 
since we will get no new solutions or insights. 

The last step is to find the best possible M. We revert to the problem which is a maximization of the 
objective function / for convenience. From Case 3, which yields the only pertinent solution for T > 2, 
the objective function with the optimal probabilities given in (56) and (57) is 

/ = KNtTE ( 1 - — ^— ) . (58) 
^ V M KNtT J ^ ' 

Notice that / is an increasing function of M, and M needs to be chosen as large as possible. However, 
if M is chosen so that M > T, inequality (25) would be strict since it is not possible to make the 
columns of all pairs of different constellation matrices orthogonal. Therefore, M = T is optimal among 
M satisfying M < T. When we take the limit as M —>■ oo, we get an upper bound on the mutual 
information which is not achievable (hence the strict inequality for the upper bound in (14)). 

To complete the proof, notice that we may use the jointly optimal P and d with the structure of 
constellation points given in (30,31) so that the upper bound in (25) is achieved with equality when 
M < T. Therefore, the optimal constellations have been obtained for the case M < T. When M > T, 
we can obtain an upper bound on the maximum achievable mutual information by letting M — >^ oo in 
(58) (and multiplying by the factor ■ 
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C. Spectral Efficiency 

Consider the normalized energy per bit for reliable communications which is given as 

^ = (59) 
A^o C(P) ' ^''^^ 

where C{P) is the Shannon capacity for the channel in bits per dimension. For the case when C(P) is 

a non-decreasing concave function, it can be seen that (59) achieves its minimum value over all P, as 

P ^ 0. However, this is not true in the PPAPR-constrained case. Indeed, since the capacity is O(P^), 

^ — > oo as P ^ 0. Therefore, it is not energy-efficient to operate at asymptotically low SNR in this 

case. The mutual information of STORM at any SNR is 

P(Y|X0 ' 



log 



(60) 



The expectations in (60) can be calculated using Monte-Carlo integration. Thus the normahzed energy 
per bit required for STORM can be determined as — jgTOflM fp\ f over the entire range of SNRs. 
It can be seen through extensive simulations over a variety of cases that the minimum energy per bit 
typically occurs at a low but non-vanishing SNR. STORM should hence be used in the vicinity of this 
SNR, for maximum spectral efficiency. In the absence of the capacity of the noncoherent MIMO channel 
at a general SNR however, there is no fair yard stick to compare the energy per bit of STORM against 
that of the capacity achieving scheme. 

IV. The peak-constrained case 

In this section, the peak-constrained problem is considered where the peak constraint K in (3) is a fixed 
constant, independent of the average power P. It can be shown by a simple time-sharing argument that 
the channel capacity in this case is concave and non-decreasing in P. Therefore, the normalized energy 
per bit given in (59) can be seen to attain its minimum value over all P, as P ^ 0. Let us denote the 

771 

normalized minimum energy per bit for our channel model by ]^^.^» in keeping with common usage [6]. 
Since C (P) is a non-decreasing function of P, it can be assumed without any loss of generality that the 
average power constraint is [tr(XX*)] = P instead of [tr(XX*)] < P. The capacity function 
(in bits/dimension) admits the following Taylor series expansion 

C{P) = CiO)P log2 e + ^C'(0)P2 log2 e + o{P^) , (61) 

where C{0) and C{0) are the first and second derivatives of C(P) computed in nats/dimension. The 
notation and units introduced above for C(P), (7(0) and (7(0) will be used in the rest of this paper. The 
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capacity per unit energy (in bits per joule) is the reciprocal of and is equal to (7(0) log2 e in the 

peak-constrained case, and either metric can be considered to be a measure of energy efficiency. There- 
fore, the notions of minimizing the energy per bit and maximizing the information rate per unit energy 
will be used interchangeably. The minimization of energy per bit is considered in Section IV-A. Note 
however that since this minimum occurs at a vanishing SNR, a fixed rate (in bits/sec) of communication 
can be only achieved in the limit of infinite bandwidth. It is hence of interest to communicate at low 
but non-vanishing SNR and also do so in a bandwidth efficient manner, which brings us to the notion of 
wideband slope introduced in [6]. 

The slope of the capacity function versus ^ (also called the spectral efficiency function) in bits per 
second per hertz per 3 dB at zero spectral efficiency is defined as the wideband slope in [6] and was shown 
to be given in terms of C (0) and C (0) as 

2 

(62) 



2 

So = - 



C{0) 



The motivation for considering the wideband slope as a performance metric is that, while achieving 
^^■^ is desirable for energy efficiency, the rate of convergence of ^ to j^^-^ as P — >^ is also an 
important factor at low P, which in turn is closely tied to spectral efficiency. The higher the wideband 
slope, the greater is the spectral efficiency when operating at small but non-vanishing SNR. This point 
about the importance of the wideband slope was highlighted through several examples in the insightful 
work of [6]. An important example provided there was that of noncoherent communications with an input 
average power constraint alone, and the wideband slope in this case was found to be <So = in contrast 
to that of coherent communication where it is positive. This result implies that to approach the 
bandwidth for reUable noncoherent communications becomes prohibitively large and the associated sig- 
naling scheme prohibitively peaky, and therefore not realistic (i.e., bandwidth Umited and peak-limited) 
scheme can achieve ^ . . 

J'O rntn 

In this work, the noncoherent MIMO channel is considered with a peak-constraint on the input, in 
addition to the average power constraint. It is shown that with the additional peak-constraint, which is 
necessary for meaningful results at low SNR, there is a tradeoff between the minimum energy per bit and 
the wideband slope. This provides a far more detailed characterization of the wideband slope than if only 
the average power constraint were imposed, and in particular it shows that it is possible to have Sq > 
provided the peak-constraint on the input is less than a certain constant. In the process, the T -|- 1 point 
constellation is derived in Section IV-B from among constellations that achieve minimum energy per bit 
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(or equivalently, (7(0)) that is optimal in wideband slope (or maximize (7(0)), which interestingly, turns 
out to be STORM again. STORM is hence optimal in spectral efficiency in the wideband regime. Apart 
from providing fundamental limits on peak-limited MIMO noncoherent communications, our results and 
conclusions also temper the pessimistic conclusions that result from the consideration of noncoherent 
communication under just an average power constraint [6]. 

A. Achieving minimum energy per bit 

In this section, the necessary and sufficient conditions for a constellation to achieve ^ . are derived. 

■' "0 mm 

First, the following definition and lemma are needed from optimization theory [28, 29]. 

Definition 4: A function / is strictly quasiconcave over a convex set A. iff for any xi , X2 G ^ and for 

o<e<i, 

/(^?xi + (l-0)x2) >min{/(xi),/(x2)} . (63) 
Lemma 5: The global minimum of a strictly quasiconcave function / over a compact convex set A. is 
attained at a point x € .4. only if x is an extreme point of A.. 

Theorem 3: Consider a constellation C with non-zero matrices {X^}^^^ and respective probabili- 
ties {Pi}l'Si, and the zero matrix with probability Pq. Let C satisfy the average power constraint 
E [tr(XX*)] = PT = E and the peak-constraint (3) as in the peak-constrained problem. Then, C 
achieves the capacity per unit energy as P iff its constellation matrices and respective probabiUties 
are of the following form 

Xj = , l<i<L-l (64) 

Xo = OrxNt 5 (65) 

L-l 

Vp = (66) 



^0 = ^-ik- 

where for each i, Vj G (C^^^, Wj G (D^*^^ and | [vjW*]^^ | = 1 V m, n. The capacity per unit energy 
achieved by the above constellation is 

N^.fl-Ml±^m\iog,e bits/joule. (68) 



KNtT 

Proof: Let the mutual information between C and the output Y be denoted as I{P) (in nats per 
dimension). It is known from [30] that to achieve the capacity per unit energy, it is sufficient to use one 
symbol apart from the zero energy symbol. Therefore, our formulation, which assumes a discrete input 
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with an arbitrary number of points, is without any loss of generahty. The optimization problem that is to 
be solved is given as 



L-l 

subject to Pi tr(XiX*) = PT , ||Xi||oo < Vk, Vi 



i=Q 
L-l 



YPi = i-Po, n>o yi. 



1=1 



A general formula for /(O) was derived in [6] and is given as 



^ ^ P^o Ex[tr(XX*)] 



Since 



.. Ex[D(Py|xI|/V|x=o)] ^ Ex[I)(Py|xII^y|x=o)] 

max lim — — - — ,., < iim max — — - — ,., ,(71) 

{POf=V.{x.}f=Y Ex[tr(XX*)] " {P,}tV,{x.}tY Ex [tr(XX*)] 

an upper bound for the optimal value of the problem in (69) is 

Ex [D{Py\^\\Py\x=o)] 
hm ma.^^^^.-^.^^^^^.^-. Ex [tr(XX*)] ' ^^^^ 

L-l 



subject to Pi tr(XiX*) = PT , \\^i\\oo < Vk, Vz 

i=0 
L-l 



i=l 

The objective function in (72) can be evaluated as 

Ex [I)(Py|xII^y|x=o)] ^ ^ EiY^.logdet(I + X,Xt) l 

Ex [tr(XX*)] ^ ■ \ Y.-=i P^ tr(X.X*) / " 

Consequently, the problem that needs to be solve is 



Ef=i'filogdet(I + X,X*) 
EiYPitr(X,X*) 

L-l 

subject to Y Pi ti-(XiX*) = PT , ||Xi||oo < Vk, Vi 

1=0 
L-l 

= l-Po, J'i >0 Vi. 



i=l 
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Relaxing the peak constraint, the optimal value of the problem in (74) over the signal constellation (but 
with the probabilities fixed) is lower bounded by the optimal value of the problem 
. EiY^»logdet(I + X,X*) 

L-l 

subject to ^Pidi = PT , tr(XiX*) = di , < di < KNtT, Vz . 

i=0 

The optimal values of problems (74) and (75) are the same iff the that solves (75) also satisfies 

||Xj||oo < VK , Vi. 

As in the PPAPR constrained problem, the above problem can be solved as a two-stage optimization, 
where in the first stage, the probabilities {Pi}fSQ are fixed and the constellation {Xj}^^^ is optimized. 
In the second step, the resulting objective function over is optimized over {Pi}f~Q. 

Consider a fixed, feasible but otherwise arbitrary {Pi}f~Q. It can be verified that for each i, 

min log det (I + X^X*) = log(l + di) , (76) 

tr(XiX*)=di 

is solved iff Xj has unit rank. 

Therefore, the problem in (75) can be re-written as 

L-l 



subject to ^P^di = PT, < di < KNtT, Vi 



1=1 



Let d = [di d2 ... cii_i]^. Consider the set 

At=\d: hid) = ^'=1' + "^'^ > t , > Vi, t > ol ■ (78) 

I Ei=i Pi di 1 



Since J2i=i -F«log(l + di) — tJ2i=i Pi di is strictly concave for every real t, the set At is convex. 
Therefore, considering any two points di, d2 G where t = min {h{di), ^(d2)} and using Definition 
4, ^*lX]^2l°p^]^'^''^ is a strictly quasiconcave function of d. Hence, from Lemma 5, the solution of (77) 
is achieved at a vertex of the constraint set. Using Lemma 4, each vertex of the constraint set consists of 
L — 1 entries that are either KNtT or 0, and exactly one entry c such that < c < KNtT. 

It can therefore be assumed, without loss of generality, that the optimal d and the corresponding prob- 
abilities are 



d = 



KNtT . . . KNtT c 



M times 



< c < KNtT. (79) 
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P = [Pi ... Pm Pc Pof. (80) 

where, for convenience, the symbol M is introduced to denote the number of entries in d that are equal to 
KNfT. Since the objective function is a symmetric function of d, the specific arrangement of the entries 
is immaterial. Using this structure for d, the problem in (77) can be re- written and bounded from below 
as 

■ EZi P^ log(l + KNtT) + log(l + c) 
mm 7-7 , {01) 

c,{P.}^ Ef=iPiKNtT + cPc 

M M 

subject to < c < KNtT , ^ PiKNfT + cPc = PT , ^ Pj = 1 - Pq 

i=l 1=1 
T.tl P^ log(l + KNtT) + Pe log(l + C) 

> mm — 77 ^ . (82) 

=.{^af=-o' E,=i Pi KNtT + c Pe 

0<c<KNfT 

The problem in (82) is easily seen to be the minimization of a strictly quasiconcave function over c. 
Therefore, the solution has to be among the vertices of < c < KNtT, ie., either c = or c = KNtT. 
Notice that with either choice of c, the objective function is , and is independent of {Pi}^^. 

Therefore, the upper bound on the optimal value of the problem in (69) is 

Since di = KNtT "ii, for equality to hold in the inequaUty leading to (75), it is necessary and sufficient 
that the non-zero matrices {Xj}^^^ be of the form 



Xi = ^/K■ViW* Vz, (84) 

where Vj G (D^''\ Wj G C^'"^^ are such that | [vjW* | = 1 V z, m, n. By substituting (84) in (69), 
a lower bound on the optimal value of (69) is obtained, which coincides with the upper bound in (83), 
implying that (83) is the optimal value of the problem in (69). From the power constraint, Y^^=i Pi — 
must be true and Pq = 1 — > 0. Therefore, it can be concluded that 



Note that the capacity per unit energy in (68) is independent of the number of points L. In particular, it 
can be achieved with a 2-point constellation. 
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Corollary : The following two point constellation achieves the capacity per unit energy as the average 
power P —> 



(Xi,P.) = (v^vw*,^) 



(86) 



(X2,P2) = (otxn,, 1-;^) ' (87) 

where v and w are column vectors such that | [vw*]^^ \ = l\f i,m,n. ■ 

The above 2-point constellation is referred to as MIMO-OOK (on-off keying). This constellation can 
also be obtained directly through a simpUfied general formula for the capacity per unit energy derived in 
[30]. It turns out that the simplified formula in [30] can be evaluated using similar techniques to those 
used in the proof of Theorem 3, and is also a more direct approach than the derivation of the capacity per 
unit energy in [31]. For the sake of completeness, it is given in Appendix-C. 

Clearly, Theorem 3 implies that there is a large class of constellations which achieve ^^-^^ For 
instance, the cardinality can be any L > 2. Moreover, only the sum of probabilities of the non-zero 
points is constrained to be while the individual probabilities can be arbitrary. Further, there is no 
restriction on the relationship between Xj and Xj , Vj 7^ i In particular, Xj can be taken to be all equal 
to a unit rank matrix X with elements of equal magnitude (equal to s/K) for alH = 1, 2, . . . , L — 1. 
In this case, the non-zero points would coincide and become one non-zero point with probability 
thereby reducing to the 2-point MIMO-OOK constellation of Corollary 3. 



B. Maximizing the wideband slope 

A key insight provided by [6] is that even though different schemes may achieve an analysis of 

their wideband slopes could reveal vast differences in the rate of growth of their energy efficiencies around 
TTT- , and therefore differentiates their spectral efficiencies. The wideband slope, which is the measure 
of spectral efficiency at low but non-vanishing SNR, is therefore critical in the analysis of wideband 
channels. Our next aim is therefore, to optimize the wideband slope over constellations which achieve 
^min ^'^^^ theorem provides a formula for the wideband slope Sq when evaluated for an arbitrary 
generaUzed OOK constellation. 

Theorem 4: Consider a constellation C with non-zero matrices {Xj}^^ and respective probabihties 
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L M 



{Pi}i=v ^^''o matrix with probability Pq. Then 



So 



» -J J I 



(88) 



2 (Ef£i P»tr(X,X*)-Efli logdct(l+X,X*))' 

u nj; i-x,x*x,x* i-^ -^oJ i-XiXt: 

I ' I * 1 I I 

if I — XjX*XjX* is positive definite V i,j 

0, otherwise. 
Proof: See Appendix -D. 

The following corollary indicates a fundamental limitation in approaching the capacity per unity energy 
for a constellation of arbitrary cardinaUty. 

Corollary 1: Consider a constellation C with non-zero matrices {Xj}^^ and respective probabilities 
{Pi}fli, and the zero matrix with probability Pq. Let C satisfy the average and peak power constraints in 
the statement of Theorem 3. Suppose C achieves the capacity per unit energy. Then the wideband slope 
So is when KNtT > 1. 

Proof: Since C achieves the capacity per unit energy, it satisfies the necessary conditions stated in 
Theorem 3. From Theorem 4, the wideband slope is non-zero only when the matrix 

I - XiX*XjX* (89) 

is positive definite for all pairs The proof of the corollary follows when the necessary conditions for 
achieving the capacity per unit energy in Theorem 3 are substituted in (89) and simplified. ■ 

Theorem 5: Among all constellations of Theorem 3 which achieve , with T + 1 points, STORM 

-'»o mm 

has the maximum wideband slope. 

Proof: Since the constellations under consideration achieve ^ , the numerator in (88) is a fixed 
constant. Further, given the necessary conditions for the constellation to achieve the denominator 

of the wideband slope can be simplified as 



(90) 



^ p2 1 p.p. 

^ (1 - Pof (1 - K^N^T^f^ ^ ^ (1 - Po 



\2 Nr 

> I-XiX*XjX* 



where the matrices {Xj}^]^ are of unit rank with entries of equal magnitude vK, and KNtT < 1 (to 
ensure that I — XjX*XjX^ is positive semidefinite Vi, j). Clearly, (90) is minimized when there exist 
rank-one matrices {Xj}^^ such that X^Xj = Vz, j 7^ i. Such a set exists for M <T, and is denoted 
by Xj = a/Kv^w* where the definitions for and are the same as in Theorem 2. The problem 
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that needs to be solved is thus 

mill > . ^o < AT - 1 )■ • (91) 

V-M p._, p„ v-M P ' ^ ^ !■ / 

The objective function in (91) can be easily shown to be a Schur-convex function [32] of [Pi P2 • • • Pm]- 

Hence, the minimum occurs when each of the probabilities {Pi}f£i is equal to The optimal value 
of (91) is therefore 

^ li . (92) 




Clearly, M has to be made as large as possible, but to ensure achievabUty of the optimal value in (91), it 
can be no greater than r+ 1. Therefore, set M = r + 1. Evidently, the solution to (91) when M >T+1 
would provide an upper bound on the maximum wideband slope. ■ 

Theorem 5 establishes the optimaUty of STORM among T + 1 point constellations in the peak- 
constrained case. This means that STORM is spectrally most efficient among all T + 1 (or fewer) point 
constellations that achieve maximum capacity per unit energy in the low SNR regime. 

The following corollary provides the wideband slopes of MIMO-OOK and STORM. 

Corollary 2: The wideband slopes of MIMO-OOK and STORM are respectively. 



9'- 



OOK 



(l-K2iv2T2)"- (93) 

^ if KNtT > 1 . 

' r, N^{KNtT-log{l+KNtT))^ if ifiVtT < 1 ; 

if KNtT > 1 . 

Proof: The wideband slopes follow by substituting the MIMO-OOK and STORM constellations in 



STORM 



the result of Theorem 4. ■ 
C. Remarks 

Since STORM was obtained as the optimal constellation even in the PPAPR constrained case, many 
of the remarks on STORM following Theorem 2 and in Section III-A apply even to the peak-constrained 
case. Here we only state new insights pertinent to the peak-constrained case. 

1. From (68), it is seen that \imK--^oo C{0) = Nr. Therefore, for asymptotically large peak-powers, the 
well known result on the capacity per unit energy with only an average power constraint [6] which is 
common to both coherent and noncoherent MIMO channels, is recovered. Indeed, when Nr = 1, we 
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obtain the minimum energy to transmit one bit of information to be —1.59 dB, which is a classical result. 
By relaxing the peak-constraint, STORM can be seen to be optimal even for the case when there is merely 
an average power constraint (or with respect to infinite bandwidth capacity). 

2. When the signals are just subject to an average power constraint, it is shown in [6] that Sq = for 
the noncoherent MIMO channel. Therefore, signals whose energy per bit approaches ^^-^ would have 
to have bandwidths that become prohibitively large. However, when there is an additional peak-power 
constraint K which is a fixed constant, and for the case when the normalized peak power KNfT < 1, 
Corollary 2 shows that Sq is strictly positive. Hence, it is realistic to design signals that achieve the 
^min scenario for low but non-vanishing SNR. Similar insights were also noted in [12] but in the 
simpler context of the SISO Rician fading chaimel with unit block length under peak and average power 
constraints. 

3. While both MIMO-OOK and STORM achieve according to Corollary 2, the wideband slope 
of STORM is higher by a factor of T. This means that at a certain energy per bit and for the same 
transmission rate, and as SNR 0, the bandwidth needed by STORM for the same spectral efficiency is 
less than that of MIMO-OOK by a factor of T. Given typical values of the coherence time T, this higher 
spectral efficiency of STORM can translate into huge savings. To give a sense of the significant gains. 
Figures 1 and 2 plot the spectral efficiency vs. the energy per bit for STORM and MIMO-OOK. 

4. Figures 3 and 4 plot the energy per bit and wideband slope of STORM vs. the normalized peak power 
KNfT, for different values of Nr. As the normalized peak power increases, it is seen that the jVomm 
decreases. This is expected as peakier signaling is more energy efficient. However, as the normalized 
peak power gets close to 1, the wideband slope approaches 0. In fact, the wideband slope attains its 
maximum at an intermediate value between and 1 (say KNtT = c*). Since for any point in the region 
< KNtT < c* there is a point corresponding to c* < KN^T < 1 with lower . and the same 
wideband slope, it makes most sense to operate in the region c* < KNtT < 1. Assuming only an 
average power constraint, the analysis in [6] shows that iSo = for noncoherent communications. The 
scheme that achieves the ^^^-^ has the non-zero signals migrating to oo in ampUtude as P ^ 0. The 
results in [6] show in effect that it is unreaUstic to realize the peak-unconstrained minimum energy per 
bit (STORM having zero wideband slope for all KNtT > 1 is clearly a stronger statement). Under 
reaUstic assumptions on the peak-constraint however, it has been shown here that 5o > is possible 
when KNtT < 1. Moreover, a sharp characterization is provided which shows that there is a tradeoff 
between |fc . and Sq for STORM in the region c* < KNtT < 1. 

No mm " b — I — 
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5. For the same number of bits transmitted reliably per joule at low SNR, MIMO-OOK requires an 
operating SNR which is 10 log^o T dB smaller than that of STORM. This can be seen from the fact that 
the wideband slope of STORM is T times that of MIMO-OOK and that mutual information per joule is 
given as 

= /(0)(log2 e) + ^/(0)(log2 e)P + o{P). (95) 

and the wideband slope is Sq = -j^j^- Now, since the peak-power is a fixed constant, this implies that 
the PAPR of MIMO-OOK at any small but non- vanishing SNR would be greater than tiiat of STORM by a 
factor of T. Since in the low SNR regime, peakiness of the signal constellations is a crucial factor, using 

STORM can potentially result in large reductions in the required PAPR and facilitate implementation. 

UP) 

These large savings are illustrated in Figure 5, where the approximation of vs. P is plotted for 
STORM and MIMO-OOK. In the example shown, the convergence to the capacity per unit energy is 
faster for STORM by a factor of 10 logio ^ = 9 dB relative to MIMO-OOK. 

6. It has been shown in Corollary 1 that whenever KNtT > 1, the wideband slope is 0. Therefore, 
even though the noncoherent capacity per unit energy is Nj. log2 e bits/joule, it is prohibitively expensive 
(in terms of bandwidth) to reliably transmit at any rate more than the peak-constrained capacity per unit 
energy evaluated at KNtT = 1 which from equation (68) is Nr {log2e — 1) bits/joule. Hence, the 
capacity per unit energy at KNfT = 1 can be taken to be the realistic limit for noncoherent MIMO 
communication. Note that this limit is also bits/joule smaller than the coherent capacity per unit 
energy. Since the analysis of the noncoherent channel neither assumes any particular scheme for channel 
estimation nor does it ignore the resources for (implicit) channel estimation, the realistic capacity per 
unit energy of A/r(log2e — 1) bits/joule can be argued as being more fundamental than the coherent 
capacity per unit energy of Nj. log2 e bits/joule. The difference between the two can be thought of as the 
fundamental or minimal cost of (implicit) channel estimation. 

7. The dependence of mm '^^ ^* ^^'^ ^'^^^ through the product KNtT. So, increasing one or 



more these quantities has the effect of lowering -W^ . . However, this effect is beneficial when KNtT 



< 



c* and beyond that the tradeoff between energy efficiency and bandwidth efficiency is quantified here 
that allows a designer to choose a suitable operating point. To illustrate this point. Figure 6 plots the 
approximation of vs. P for different values of KNtT. It is evident from Figure 6, that even as 
KNtT gets close to one, the bits required to transmit reUably converges to the capacity per unit energy 
at much smaller SNRs (and hence larger bandwidtiis). Since tiie R\PR of STORM at SNR P is it 
interesting to note that when KNtT is fixed, increasing T decreases the PAPR required for the same 
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energy per bit which is an advantage in practice. Increasing Nt with KNtT fixed, decreases K and 
therefore reduces the peak-power per anteima and time slot (though not changing the PAPR), which may 
also be helpful in practice. 

8. An interesting observation from Figures 3 and 4 is that using more receive anteimas always lowers 
^min' ^^^^ '^^^ always increase the wideband slope. Figure 7 illustrates that while the approxi- 
mation of increases with A^^ in general, the convergence to the capacity per unit energy occurs more 

slowly and hence a lower SNR is needed to operate close to it as A',, increases. 

9. Even though the optimal scheme for a cardinality more than T + 1 is yet unknown, STORM offers a 
concrete solution whose structure is also simple and practical. In [6], the positive impact on the wideband 
slope of using constellations with cardinality greater than two is illustrated via several contexts other than 
under the noncoherent assumption. Even so, following [30] and due to analytical convenience, many 
recent papers [12, 31, 33] in noncoherent communications focus on the two point ON-OFF scheme to 
achieve the capacity per unit energy. The results in this section demonstrate that there are compelling 
reasons to look beyond the two point ON-OFF scheme in the low SNR regime. 

10. Recently, [34-37] have investigated the possibility of channel coherence length scaling with SNR, 
so as to diminish the cost of acquiring channel knowledge. It should be interesting to pose and solve the 
optimization problems of this work under such scenarios. 

V. Conclusion 

We pose two important problems on reliable communications over noncoherent MIMO spatially i.i.d. 
Rayleigh fading chaimels at low SNR. In both formulations, we assume an average-power constraint on 
the input and a natural per-antenna, per-time slot peak-power constraint. In the first problem formulation, 
the peak-power to average-power ratio is held fixed (PR\PR-constrained) and the mutual information 
which grows as O(SNR^) is maximized up to second order jointly over input signal matrices and their re- 
spective probabilities, when the cardinality of the constellation is no greater than T+ 1 (T is the coherence 
blocklength). In the second problem formulation (peak-constrained), the peak-power is a fixed constant 
independent of SNR. Here, necessary and sufficient conditions for a constellation of any cardinality to 
achieve the minimum energy/bit are derived. Over the set of all T -I- 1 point constellations which achieve 
the minimum energy/bit, we optimize the second order behaviour of mutual information. The resulting 
constellations are both first and second order optimal among all T -|- 1 point constellations. Both the 
PPAPR-constrained and peak-constrained problems result in finite dimensional non-convex optimization 
problems. Even so, they admit elegant solutions in closed form, which are identical in both formulations. 
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We refer to this common solution as Space Time Orthogonal Rank-one Modulation (STORM), and it 
provides several new insights on noncoherent communications at low SNR. 

In the PPAPR-constrained case, we show that the T + 1 point STORM is near-optimal with respect 
to the maximum mutual information up to second order with unconstrained cardinality even for modest 
values of T and PAPR. Therefore, there is not much to be gained by using more than T + 1 points in 
the PPAPR-constrained case. In the peak-constrained case, our approach enables us to provide a sharp 
characterization of the first and second order behavior of noncoherent MIMO capacity, that also sheds 
light on the cost of implicit estimation of channel state information in the low SNR regime. The energy /bit 
and the wideband slope achieved by STORM also reveals a fundamental energy- vs-bandwidth efficiency 
tradeoff that enables the determination of the operating (low) SNR and peak power most suitable for a 
given appUcation. Moreover, while the more conventional MEMO On-Off Keying (OOK) also achieves 
the minimum energy per bit, STORM has a wideband slope that is T times greater which translates into 
an increase in bandwidth efficiency (or a decrease in the PAPR) by a factor of T in the wideband regime. 
Given typical values of the coherence blocklength T, these gains are potentially huge. 
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Appendix 

A. Proof of non-convexity 

A simple argument is given to show that (25) is a non-convex optimization problem. 
We need the following definition of matrix convexity. 

Definition 5: A function / : Si"^" — s- ^^^^ is matrix convex with respect to matrix inequahty if for 
any positive semidefinite Xi, X2 and for any 9 G [0, 1] 

/(eXi + (l-e)X2) < 0/(Xi) + (l-e)/(X2) . (96) 
Since {Xj}^^ is a set of complex matrices, the optimization over the signals amounts to an equivalent 
joint optimization over the real and imaginary parts of Xj given by Xj = Xj -|- jXj , Vz. In order to 
show that this joint optimization is non-convex, we will consider the contour given by Xj = , Vz. With 
the imaginary parts being zero, the function in (25) becomes -Pj (1 — Pi) tr ^XjX*XjXT^ 

It can be seen that ^(X) = XX* is matrix-convex over X, and h{A) = tr(AA*) is a non-decreasing 
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convex function over positive semidefinite mattices A. Therefore, the composition /(X) = ho g = 
tr ^XX*XX*^ is a convex function over X [29]. Further, since tr(XX*) and ||X||(X) are convex func- 
tions of X [29], the constraints Pitr(XiX*) < £^and ||Xj||oo < -v/^ are convex sets in {Xj}^]^. For 
an arbitrary but fixed set of probabihties {Pj}^]^, the objective function is convex in {Xj}^^, while the 
constraint set is the intersection of convex sets and is hence convex. Therefore, the problem of optimizing 
(25) over {Xjj^j^ is a convex maximization problem and not a convex optimization problem. Since for 
a fixed {Pi}^^^, the problem of optimizing over {Xj}^^ is a non-convex optimization problem for the 
imaginary parts of Xj fixed, the joint optimization over {Pj}^^ and {Xj}^]^ is also non-convex. 

B. A low complexity block decoder 

In some applications, decoding of a block of symbols at a time may be required. This need arises for 
instance in uncoded systems, where there is no coding across blocks. Another possibihty is when there is 
coding across blocks, but hard decision decoding is employed at the receiver so that the blocks of symbols 
are first decoded via the MAP rule following which the outer code is decoded. In all such cases, we show 
in this section that the optimal MAP decoding of STORM can be simpUfied using Fast Fourier Transform 
(FFT) or Fast Hadamard Transform (FHT) algorithms. 

Consider the T + 1 point STORM as described in (11) and (12). Let the received signal matrix be 
R gC^^^*^. The optimal MAP rule to decode a block at the receiver is 

7 = max P,p(R|Xj) (97) 

j 

exp |-tr ( R* {1t + X^X*) ~^ R J | 
= max P-j ^^-^ ^ ^ (98) 



It+ X,X* 



For convenience, we will first find the maximum in (98) among the non-zero signal matrices, and then 
compare it with the metric for the zero matrix. Substituting STORM that is defined with permutation 
matrix P, we get that the maximum metric among non-zero matrices is 



E exp { -tr (y* (It + KNt ViV*)-^ y) } 



max K iTr ^ , (99) 

i=i,...,T KNtT^ 7r™'-|lT + KA^tViV*|^'- 

where Y = P*R is a sufficient statistic, which is simply the received matrix with the permutation 

removed. The term (ly + KN^ VjvT)~^ can be simplified by applying the Woodbury's identity, i.e., 

using 

(A + BCD)-^ = A-^ - A-^B (C-^ + DA-^B)"^ DA"^ . 
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Also, using the identity |I + AB| = |I + BA|, (99) becomes 

E exp{-tr(YY-)} f KN, ^ .y.^. ^..y.] .qq. 



i=i, ... ,T TT^^rKNtT^il + KNtT)^r ^ \ 1 + KNtT 

Clearly, among the non-zero constellation matrices, the MAP metric is maximized when ||Y*Vi|p is 
maximized. Let V be the T dimensional DFT or Hadamard matrix. Then each row of the matrix Y* V 
would represent the DFT or Hadamard transform of the corresponding row of Y*. The non-zero constel- 
lation matrix with the maximum MAP metric would therefore correspond to the column of Y* V with the 
maximum /2-norm. The DFTs or Hadamard transforms involved can be efficiently computed using 
fast algorithms (FFTs and FHTs). Now, the metric corresponding to the zero matrix would be 

E Vxp(-tr(YY*)) 
KNtT) TT^^r ■ ^'"'^ 

Since this is a constant for a given received signal, we can divide the metric in (100) by (101) and then 
take the natural logarithm of the resulting expression so that 



E "\ KNt 

T{KNtT - E)(l + KNtT)^- J ^ 1 + KNtT ' 



^i = ^^{ rr.^r..rrr. r...-, , r. .r rr^.N. } + . , r. I rr. ^^ (YV.V^Y) . (102) 



Now letting i = arg maxfe=i^...^r ||Y*Vfe|p, the final simphfied decoding rule can be given as 

'' i if > 

(103) 

T + 1 if < 0. 

C. Derivation ofMIMO-OOK 

Theorem 6: The capacity per unit energy (in nats/joule) for the i.i.d. MIMO block Rayleigh fading 
channel with a peak power constraint on the input signal ||X||oo < y/K is 

^(°)--'(-!^^^%^)' 

and is achieved as P by the two point constellation given as 



(Xi,P.) = (VZvw*,^) 



(105) 



(X2,P2) = (otxTV* , 1 - , (106) 

wherevG(r^''\we(r^*''^and|[vw*]^„| = lVz,m,n. 

Proof: From [30], it is known that to achieve the channel capacity per unit energy, it is enough to 
transmit one non-zero symbol, given in (105), apart from the symbol 0. Since we are dealing with a 
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memoryless, discrete and matrix input channel (1) with the cost-function given by b(X.) = tr(XX*), the 
capacity per unit energy under a fixed peak power constraint is given by [30] 

l|X||oo<v^ 

Using the expression for the KuUback-Liebler distance which can be obtained easily (c.f. [17]), we obtain 

l|X||oo<vlf 

^ /^_logdet(I. + XX.)X ^j^j 

X^^O , d \ d J 

tr(XX*)=d , ||X||oo<V^ 

Let the matrix XX* have eigenvalues {Ajj^^. Then (109) can be upper bounded as 

C(0) < sup iVri_EiMi±M^ (110) 

Ei Ai=d , d<KNtT 

= sup iY.fl-MlM^ (111) 



d 

d<KNfT 



d 



KNtT 

The expression in (111) is obtained by noting that since — log(l + Ai) is a convex function of 
[Ai A2 ... Ay]^, the supremum in (110) is achieved at the extreme point [d ... 0]^ by Lemma 
2. Since ^1 — ^ is a monotonically increasing function of d, we obtain (112) by substituting 

the maximum value of d. The inequahty in (110) is achieved with equahty when X is of unit rank, 
tr(XX*) = d and ||X||oo < ^/H. The supremum in (111) is achieved when d = KNtT, and the unit 
rank X satisfies both tr(XX*) = KNtT as well as ||X||oo < VK which in turn is true iff it is of the 
form given in (105). To satisfy the average power constraint, set Pi = -j^- ■ 

D. Proof of Theorem 4 

The results regarding generaUzed on-off signaling given in [6] are employed. In particular, note that 
Theorem 10 in [6] provides the ^ and <So achieved by a generalized on-off signaling scheme. For 

J'O mm J ^ ^ ^ 

convenience, that result is summarized here. 

The generalized on-off signaling scheme has a Pq rnass at the all-zero matrix O^xWi- The input pdf 
conditioned on the input being nonzero is denoted by Px, with distribution Fx- With the input pdf 
conditioned on the all-zero matrix given by P^, the input pdf is 

Px = (Po)^'" + (1 - Po)Px . (1 13) 
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Denoting the pdf of the output conditioned on the input by the output pdf corresponding to Px is 

given by 

(114) 



Py = j Py\x=il dFxm . 
The wideband slope Sq achieved by generalized on-off signaling is 

2 {^JD {Py\x\\PY\X=o)]y 



Sn = 



T A{Py\\Py\x=o) 
where A(.| |.) denotes the Pearson's x-divergence and is defined as 



^(-P>"ll-fV|x=o) = Ep, 



Y\X=0 



Py 



Py\x=o 

For the channel model under consideration in this paper, we have 

Pi 1 



(115) 



(116) 



M 



Py 



y\x=o 



^(l-Po)7r™'-|I + X,X*|^'- 
exp(-tr(Y*Y)) 



exp{-tr (y* (I + XiX*)-^Y)} 



TT 



and using the above expressions in (116), one obtains 



A{Py\\Py\x=o) = Ep, 



Y\X=0 



M 



P? 



„2tr(Y*(I-(I+X,X*)-i)Y) 



,2Nr 



^,{l-Po? |I + X,X^ 

^ PiPj etKY*(2I-(I+XiX*)-i-(I+X,.X*)-i)Y) 

ii+x,x*r^ 



I + X,X* 



Nr 



^ R e*KY*(i-(i+XiX|)-i)Y 
_2 > 1 h 1 

Z-^ 1 — Po II ' ^ ■sr*\Nr 



1=1 



: + x.x: 



(117) 



The above expression can be evaluated using the result from [38] that if z is CJ\f{0, K) distributed, 
then Ez [exp (z* Az)] = {dct (I — KA)}^^ if I — KA is positive definite. Otherwise, the expectation 
diverges. Hence (117) becomes 

P? 1 



M 



^{Py\\Py\x=o) = 1 + EtTTo)!^ 

i=l ^ ^ I 



+ XiX*!^^'- |I - (21 - 2(1 + XiXi)-i] 
1 



. a 



iPiPj 



M 



(1 — -Pq)^ |j _j_ ^.^*\Nr 



I + X,X* 



Nr 



(i + x,x*)-i + (i + x,-x*)-i-i 



Nr 



IVl p 



1 



+XiX*i^'-|i+XiX*r^'- ' 



(118) 



if I — XjX*XjX^ is positive definite V and oo otherwise. SimpUfication of (118) results in 
A (Py 1 1 Py\x=o) given in Theorem 4. ■ 
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Normalized Energy per Bit Eb/NO (bits/Joule) 
Fig. 1 

Plot of spectral efficiency vs. energy/bit of STORM using the second order approximation 

OF I{P) IN (95), for different values of KNtT. 
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Fig. 2 

Plot of spectral efficiency vs. energy/bit of STORM using the second order approximation 

OF /(P) IN (95), FOR DIFFERENT VALUES OF Nr. 
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Bits transmitted reliably per joule, T = 8, Nr = 2 
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